Goto

Collaborating Authors

 default action


Ranking Policy Decisions

Neural Information Processing Systems

Policies trained via Reinforcement Learning (RL) without human intervention are often needlessly complex, making them difficult to analyse and interpret. In a run with $n$ time steps, a policy will make $n$ decisions on actions to take; we conjecture that only a small subset of these decisions delivers value over selecting a simple default action. Given a trained policy, we propose a novel black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states. We argue that among other things, the ranked list of states can help explain and understand the policy. As the ranking method is statistical, a direct evaluation of its quality is hard. As a proxy for quality, we use the ranking to create new, simpler policies from the original ones by pruning decisions identified as unimportant (that is, replacing them by default actions) and measuring the impact on performance. Our experimental results on a diverse set of standard benchmarks demonstrate that pruned policies can perform on a level comparable to the original policies. We show that naive approaches for ranking policies, e.g.


Appendices A

Neural Information Processing Systems

We give examples of computing the path-specific harm in Appendices B-D. Omission Problem: Alice decides not to give Bob a set of golf clubs. Therefore, according to the CCA, Alice's decision not to give Bob the'Bob given clubs', and outcome Whatever utility function describes Bob's preferences, the action Note there are other reasonable scenarios where Alice's actions would constitute harm. 'the clerk Alice harmed Bob by not giving him golf clubs'. For example, if Bob's utility is U ( y)= y (i.e. 1 for clubs, 0 for no clubs), then the harm caused by Alice is P ( Y A moment later, Eve would have robbed Bob of his clubs.




Appendices A

Neural Information Processing Systems

We give examples of computing the path-specific harm in Appendices B-D. Omission Problem: Alice decides not to give Bob a set of golf clubs. Therefore, according to the CCA, Alice's decision not to give Bob the'Bob given clubs', and outcome Whatever utility function describes Bob's preferences, the action Note there are other reasonable scenarios where Alice's actions would constitute harm. 'the clerk Alice harmed Bob by not giving him golf clubs'. For example, if Bob's utility is U ( y)= y (i.e. 1 for clubs, 0 for no clubs), then the harm caused by Alice is P ( Y A moment later, Eve would have robbed Bob of his clubs.




Ranking Policy Decisions

Neural Information Processing Systems

Policies trained via Reinforcement Learning (RL) without human intervention are often needlessly complex, making them difficult to analyse and interpret. In a run with n time steps, a policy will make n decisions on actions to take; we conjecture that only a small subset of these decisions delivers value over selecting a simple default action. Given a trained policy, we propose a novel black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states. We argue that among other things, the ranked list of states can help explain and understand the policy. As the ranking method is statistical, a direct evaluation of its quality is hard.


Clustered Policy Decision Ranking

Levin, Mark, Chockler, Hana

arXiv.org Artificial Intelligence

Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. Given a trained policy, we propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. We compare our measure against a previous statistical fault localization based ranking procedure.


Counterfactual harm

Richens, Jonathan G., Beard, Rory, Thompson, Daniel H.

arXiv.org Artificial Intelligence

To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomized control trial data. We find that the standard method of selecting doses using treatment effects results in unnecessarily harmful doses, while our counterfactual approach allows us to identify doses that are significantly less harmful without sacrificing efficacy.